Goto

Collaborating Authors

 heart disease prediction


Interpretable Heart Disease Prediction via a Weighted Ensemble Model: A Large-Scale Study with SHAP and Surrogate Decision Trees

Hasnat, Md Abrar, Jobayer, Md, Shawon, Md. Mehedi Hasan, Alam, Md. Golam Rabiul

arXiv.org Artificial Intelligence

Cardiovascular disease (CVD) remains a critical global health concern, demanding reliable and interpretable predictive models for early risk assessment. This study presents a large-scale analysis using the Heart Disease Health Indicators Dataset, developing a strategically weighted ensemble model that combines tree-based methods (LightGBM, XGBoost) with a Convolutional Neural Network (CNN) to predict CVD risk. The model was trained on a preprocessed dataset of 229,781 patients where the inherent class imbalance was managed through strategic weighting and feature engineering enhanced the original 22 features to 25. The final ensemble achieves a statistically significant improvement over the best individual model, with a Test AUC of 0.8371 (p=0.003) and is particularly suited for screening with a high recall of 80.0%. To provide transparency and clinical interpretability, surrogate decision trees and SHapley Additive exPlanations (SHAP) are used. The proposed model delivers a combination of robust predictive performance and clinical transparency by blending diverse learning architectures and incorporating explainability through SHAP and surrogate decision trees, making it a strong candidate for real-world deployment in public health screening.


A Comprehensive Machine Learning Framework for Heart Disease Prediction: Performance Evaluation and Future Perspectives

Lamir, Ali Azimi, Razzagzadeh, Shiva, Rezaei, Zeynab

arXiv.org Artificial Intelligence

This study presents a machine learning-based framework for heart disease prediction using the heart-disease dataset, comprising 303 samples with 14 features. The methodology involves data preprocessing, model training, and evaluation using three classifiers: Logistic Regression, K-Nearest Neighbors (KNN), and Random Forest. Hyperparameter tuning with GridSearchCV and RandomizedSearchCV was employed to enhance model performance. The Random Forest classifier outperformed other models, achieving an accuracy of 91% and an F1-score of 0.89. Evaluation metrics, including precision, recall, and confusion matrix, revealed balanced performance across classes. The proposed model demonstrates strong potential for aiding clinical decision-making by effectively predicting heart disease. Limitations such as dataset size and generalizability underscore the need for future studies using larger and more diverse datasets. This work highlights the utility of machine learning in healthcare, offering insights for further advancements in predictive diagnostics.


CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data

Sumon, Md. Shaheenur Islam, Islam, Md. Sakib Bin, Rahman, Md. Sohanur, Hossain, Md. Sakib Abrar, Khandakar, Amith, Hasan, Anwarul, Murugappan, M, Chowdhury, Muhammad E. H.

arXiv.org Artificial Intelligence

The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within biological systems remain ambiguous in these spaces, highlighting the necessity of early detection and prediction of cardiovascular diseases to reduce the severe morbidity and mortality with these conditions worldwide. We handle this issue with CardioTabNet, which exploits the strength of tab transformer to extract feature space which carries strong understanding of clinical cardiovascular data and its feature ranking. As a result, performance of downstream classical models significantly showed outstanding result. Our study utilizes the open-source dataset for heart disease prediction with 1190 instances and 11 features. In total, 11 features are divided into numerical (age, resting blood pressure, cholesterol, maximum heart rate, old peak, weight, and fasting blood sugar) and categorical (resting ECG, exercise angina, and ST slope). Tab transformer was used to extract important features and ranked them using random forest (RF) feature ranking algorithm. Ten machine-learning models were used to predict heart disease using selected features. After extracting high-quality features, the top downstream model (a hyper-tuned ExtraTree classifier) achieved an average accuracy rate of 94.1% and an average Area Under Curve (AUC) of 95.0%. Furthermore, a nomogram analysis was conducted to evaluate the model's effectiveness in cardiovascular risk assessment. A benchmarking study was conducted using state-of-the-art models to evaluate our transformer-driven framework.


Feature selection strategies for optimized heart disease diagnosis using ML and DL models

Ahmad, Bilal, Chen, Jinfu, Chen, Haibao

arXiv.org Artificial Intelligence

Heart disease remains one of the leading causes of morbidity and mortality worldwide, necessitating the development of effective diagnostic tools to enable early diagnosis and clinical decision-making. This study evaluates the impact of feature selection techniques--Mutual Information (MI), Analysis of Variance (ANOVA), and Chi-Square--on the predictive performance of various machine learning (ML) and deep learning (DL) models using a dataset of clinical indicators for heart disease. Eleven ML/DL models were assessed using metrics such as precision, recall, AUC score, F1-score, and accuracy. Results indicate that MI outperformed other methods, particularly for advanced models like neural networks, achieving the highest accuracy of 82.3% and recall score of 0.94. Logistic regression (accuracy 82.1%) and random forest (accuracy 80.99%) also demonstrated improved performance with MI. Simpler models such as Naive Bayes and decision trees achieved comparable results with ANOVA and Chi-Square, yielding accuracies of 76.45% and 75.99%, respectively, making them computationally efficient alternatives. Conversely, k-Nearest Neighbors (k-NN) and Support Vector Machines (SVM) exhibited lower performance, with accuracies ranging between 51.52% and 54.43%, regardless of the feature selection method. This study provides a comprehensive comparison of feature selection methods for heart disease prediction, demonstrating the critical role of feature selection in optimizing model performance. The results offer practical guidance for selecting appropriate feature selection techniques based on the chosen classification algorithm, contributing to the development of more accurate and efficient diagnostic tools for enhanced clinical decision-making in cardiology.


A Hybrid CNN-Transformer Model for Heart Disease Prediction Using Life History Data

Hao, Ran, Xiang, Yanlin, Du, Junliang, He, Qingyuan, Hu, Jiacheng, Xu, Ting

arXiv.org Artificial Intelligence

This study proposed a hybrid model of a convolutional neural network (CNN) and a Transformer to predict and diagnose heart disease. Based on CNN's strength in detecting local features and the Transformer's high capacity in sensing global relations, the model is able to successfully detect risk factors of heart disease from high-dimensional life history data. Experimental results show that the proposed model outperforms traditional benchmark models like support vector machine (SVM), convolutional neural network (CNN), and long short-term memory network (LSTM) on several measures like accuracy, precision, and recall. This demonstrates its strong ability to deal with multi-dimensional and unstructured data. In order to verify the effectiveness of the model, experiments removing certain parts were carried out, and the results of the experiments showed that it is important to use both CNN and Transformer modules in enhancing the model. This paper also discusses the incorporation of additional features and approaches in future studies to enhance the model's performance and enable it to operate effectively in diverse conditions. This study presents novel insights and methods for predicting heart disease using machine learning, with numerous potential applications especially in personalized medicine and health management.


KACQ-DCNN: Uncertainty-Aware Interpretable Kolmogorov-Arnold Classical-Quantum Dual-Channel Neural Network for Heart Disease Detection

Jahin, Md Abrar, Masud, Md. Akmol, Mridha, M. F., Aung, Zeyar, Dey, Nilanjan

arXiv.org Artificial Intelligence

Heart failure remains a critical global health issue, contributing significantly to cardiovascular disease and accounting for 17.8 millions of annual deaths. The need for innovative diagnostic strategies is pressing, as classical machine learning models face challenges such as handling complex, high-dimensional data, class imbalances, poor categorical feature representations, limited performance on small datasets, and the absence of uncertainty quantification. Moreover, the interpretability of these models is often hindered by their'black box' nature, complicating clinical trust and decision-making. While quantum machine learning shows potential, existing hybrid models have yet to fully capitalize on quantum advantages. To address these gaps, we propose Kolmogorov-Arnold Classical-Quantum Dual-Channel Neural Network (KACQ-DCNN), a novel hybrid dual-channel neural network that integrates Kolmogorov-Arnold Networks (KANs) in place of traditional multilayer perceptions, enabling univariate learnable activation functions on edges. As an early adopter of KAN components, we observed that the approach significantly improved the ability of the network to approximate continuous functions with reduced complexity and improved generalizability. Our comprehensive evaluation demonstrates that the KACQ-DCNN 4-qubit 1-layered model outperforms 37 benchmark models, including 16 classical machine learning models, 12 quantum neural networks, six hybrid models, and three variants of the KACQ-DCNN architecture. It achieved an accuracy of 92.03%, along with a macro-average precision, recall, and F1 score of 92.00%, representing significant improvements across all metrics. Moreover, KACQ-DCNN achieved a ROC-AUC score of 94.77%, supported by two-tailed paired t-tests against nine top-performing models, with a significance level (α) of 5% and a Bonferroni correction applied (α


Multi-class heart disease Detection, Classification, and Prediction using Machine Learning Models

Haque, Mahfuzul, Miah, Abu Saleh Musa, Gupta, Debashish, Prince, Md. Maruf Al Hossain, Alam, Tanzina, Sharmin, Nusrat, Ali, Mohammed Sowket, Shin, Jungpil

arXiv.org Artificial Intelligence

Heart disease is a leading cause of premature death worldwide, particularly among middle-aged and older adults, with men experiencing a higher prevalence. According to the World Health Organization (WHO), non-communicable diseases, including heart disease, account for 25\% (17.9 million) of global deaths, with over 43,204 annual fatalities in Bangladesh. However, the development of heart disease detection (HDD) systems tailored to the Bangladeshi population remains underexplored due to the lack of benchmark datasets and reliance on manual or limited-data approaches. This study addresses these challenges by introducing new, ethically sourced HDD dataset, BIG-Dataset and CD dataset which incorporates comprehensive data on symptoms, examination techniques, and risk factors. Using advanced machine learning techniques, including Logistic Regression and Random Forest, we achieved a remarkable testing accuracy of up to 96.6\% with Random Forest. The proposed AI-driven system integrates these models and datasets to provide real-time, accurate diagnostics and personalized healthcare recommendations. By leveraging structured datasets and state-of-the-art machine learning algorithms, this research offers an innovative solution for scalable and effective heart disease detection, with the potential to reduce mortality rates and improve clinical outcomes.


Advancements In Heart Disease Prediction: A Machine Learning Approach For Early Detection And Risk Assessment

Ingole, Balaji Shesharao, Ramineni, Vishnu, Bangad, Nikhil, Ganeeb, Koushik Kumar, Patel, Priyankkumar

arXiv.org Artificial Intelligence

The primary aim of this paper is to comprehend, assess, and analyze the role, relevance, and efficiency of machine learning models in predicting heart disease risks using clinical data. While the importance of heart disease risk prediction cannot be overstated, the application of machine learning (ML) in identifying and evaluating the impact of various features on the classification of patients with and without heart disease, as well as in generating a reliable clinical dataset, is equally significant. This study relies primarily on cross-sectional clinical data. The ML approach is designed to enhance the consideration of various clinical features in the heart disease prognosis process. Some features emerge as strong predictors, adding significant value. The paper evaluates seven ML classifiers: Logistic Regression, Random Forest, Decision Tree, Naive Bayes, k-Nearest Neighbors, Neural Networks, and Support Vector Machine (SVM). The performance of each model is assessed based on accuracy metrics. Notably, the Support Vector Machine (SVM) demonstrates the highest accuracy at 91.51%, confirming its superiority among the evaluated models in terms of predictive capability. The overall findings of this research highlight the advantages of advanced computational methodologies in the evaluation, prediction, improvement, and management of cardiovascular risks. In other words, the strong performance of the SVM model illustrates its applicability and value in clinical settings, paving the way for further advancements in personalized medicine and healthcare.


Predicting Coronary Heart Disease Using a Suite of Machine Learning Models

Al-Karaki, Jamal, Ilono, Philip, Baweja, Sanchit, Naghiyev, Jalal, Yadav, Raja Singh, Khan, Muhammad Al-Zafar

arXiv.org Artificial Intelligence

Coronary Heart Disease affects millions of people worldwide and is a well-studied area of healthcare. There are many viable and accurate methods for the diagnosis and prediction of heart disease, but they have limiting points such as invasiveness, late detection, or cost. Supervised learning via machine learning algorithms presents a low-cost (computationally speaking), non-invasive solution that can be a precursor for early diagnosis. In this study, we applied several well-known methods and benchmarked their performance against each other. It was found that Random Forest with oversampling of the predictor variable produced the highest accuracy of 84%.


Stronger Baseline Models -- A Key Requirement for Aligning Machine Learning Research with Clinical Utility

Wolfrath, Nathan, Wolfrath, Joel, Hu, Hengrui, Banerjee, Anjishnu, Kothari, Anai N.

arXiv.org Artificial Intelligence

Machine Learning (ML) research has increased substantially in recent years, due to the success of predictive modeling across diverse application domains. However, well-known barriers exist when attempting to deploy ML models in high-stakes, clinical settings, including lack of model transparency (or the inability to audit the inference process), large training data requirements with siloed data sources, and complicated metrics for measuring model utility. In this work, we show empirically that including stronger baseline models in healthcare ML evaluations has important downstream effects that aid practitioners in addressing these challenges. Through a series of case studies, we find that the common practice of omitting baselines or comparing against a weak baseline model (e.g. a linear model with no optimization) obscures the value of ML methods proposed in the research literature. Using these insights, we propose some best practices that will enable practitioners to more effectively study and deploy ML models in clinical settings.